Comparison between the sexes (Male vs Female) for the ages and microbiota

Table of content

  1. Differentially expressed genes
  2. Stats about the DEG
  3. DEG with significant p-value and fold change
    1. Log2FC
    2. Z-score
  4. Co-expression (WGCNA))
    1. Z-score in modules
    2. Genes in modules
  5. Enrichment analysis
    1. GO analysis
    2. KEGG pathways

Generated from a Jupyter Notebook - Sources

Loads

Libraries and functions

In [1]:
source("load_libraries.R")
Warning message:
“package ‘reshape2’ was built under R version 3.6.3”
Warning message:
“package ‘XML’ was built under R version 3.6.3”
Allowing multi-threading with up to 4 threads.
Warning message:
“package ‘reshape’ was built under R version 3.6.3”
Warning message:
“package ‘plotly’ was built under R version 3.6.3”
Warning message:
“package ‘tibble’ was built under R version 3.6.3”
preparing gene to GO mapping data...

preparing IC data...

preparing gene to GO mapping data...

preparing IC data...

preparing gene to GO mapping data...

preparing IC data...

In [2]:
source("functions.R")

Data

In [3]:
load("../results/dge/gene_length.RData")
load("../results/dge/filtered_metadata.RData")
load("../results/dge/dge.RData")
load("../results/dge/contrasts.RData")
load("../results/dge/filtered_norm_counts.RData")
load("../results/dge/filtered_z_scores.RData")
load("../results/dge/dge_net_pal2.RData")
load("../results/dge/col_order.RData")
load("../results/dge/annot_col.RData")
load("../results/dge/annot_colors.RData")
load("../results/dge/genes_in_modules.RData")
load("../results/dge/all_deg_genes.RData")
In [4]:
dir_path = "sex-effect/sex-microbiota-age/"

Differentially expressed genes

Extract DEG between Male and Female for the different ages and microbiota combinations

  • Threshold for adjusted p-value: 0.05
  • Threshold for adjusted significant fold change: 1.5

Table with the factors / constrasts

In [5]:
sub_contrasts = contrasts %>%
    filter(Info == 'Male vs Female (SPF, Young)' | 
           Info == 'Male vs Female (GF, Young)' | 
           Info == 'Male vs Female (SPF, Middle-aged)' |  
           Info == 'Male vs Female (GF, Middle-aged)' |  
           Info == 'Male vs Female (SPF, Old)' |  
           Info == 'Male vs Female (GF, Old)')
sub_contrasts
A data.frame: 6 × 11
InfoInterceptMale vs FemaleGF vs SPFMiddle-aged vs YoungOld vs YoungMale & Middle-agedMale & OldMale & GFGF & Middle-agedGF & Old
<chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
Male vs Female (SPF, Young) 0100000000
Male vs Female (GF, Young) 0100000100
Male vs Female (SPF, Middle-aged)0100010000
Male vs Female (GF, Middle-aged) 0100010100
Male vs Female (SPF, Old) 0100001000
Male vs Female (GF, Old) 0100001100
In [6]:
deg_results = lapply(sub_contrasts$Info, function(x) get_dge_results(x, dge, sub_contrasts))
names(deg_results) = sub_contrasts$Info

Extract the log2FC of the DEG

In [7]:
deg = extract_DEG_log2FC(deg_results, dir_path)
Warning message:
“`data_frame()` is deprecated as of tibble 1.1.0.
Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
In [8]:
all_deg_genes = unique(c(all_deg_genes, deg$sign_fc_deg$genes))
save(all_deg_genes, file="../results/dge/all_deg_genes.RData")

Stats

In [9]:
deg = extract_DEG_stats(deg, dir_path)
deg$stat
Warning message:
“funs() is soft deprecated as of dplyr 0.8.0
Please use a list of either functions or lambdas: 

  # Simple named list: 
  list(mean = mean, median = median)

  # Auto named with `tibble::lst()`: 
  tibble::lst(mean, median)

  # Using lambdas
  list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
This warning is displayed once per session.”
Using type as id variables

A tibble: 6 × 6
Male vs Female (SPF, Young)Male vs Female (GF, Young)Male vs Female (SPF, Middle-aged)Male vs Female (GF, Middle-aged)Male vs Female (SPF, Old)Male vs Female (GF, Old)
<int><int><int><int><int><int>
All DEG (Wald padj < 0.05)2414303735673593261227
All over-expressed genes (Wald padj < 0.05 & FC > 0)1338160018221802141112
All under-expressed genes (Wald padj < 0.05 & FC < 0)1076143717451791120115
DEG (Wald padj < 0.05 & abs(FC) >= 1.5)1291142916531642116143
Over-expressed genes (Wald padj < 0.05 & FC >= 1.5)1085118412711233 65 64
Under-expressed genes (Wald padj < 0.05 & FC <= -1.5) 206 245 382 409 51 79

All DEG (Wald padj < 0.05)

In [10]:
plot_sign_DEG_upset(deg)

DEG (Wald padj < 0.05 & abs(FC) > 1.5)

In [11]:
pdf(paste('../results/dge/', dir_path, '/sign_FC_DEG_upset.pdf', sep=''))
plot_sign_FC_DEG_upset(deg)
dev.off()
plot_sign_FC_DEG_upset(deg)
pdf: 2

DEG (Wald padj < 0.05 & abs(FC) > 1.5)

Log2FC

In [12]:
data = deg$sign_fc_deg %>% select(-genes)
fc_annot = data.frame(Comparison = rep("Male vs Female", 6),
                      Age = c(rep("Young", 2), rep("Middle-aged", 2), rep("Old", 2)),
                      Microbiota = rep(c("SPF","GF"), 3),
                      row.names = colnames(data))
plot_fc_heatmap(data, fc_annot)

Z-score

In [13]:
comps = list(
    "Male vs Female (SPF, Young)" = c(grep("SPF_8w_F_+", colnames(norm_counts), perl=TRUE, value=TRUE),
                  grep("SPF_8w_M_+", colnames(norm_counts), perl=TRUE, value=TRUE)),
    "Male vs Female (SPF, Middle-aged)" = c(grep("SPF_52w_F_+", colnames(norm_counts), perl=TRUE, value=TRUE),
                  grep("SPF_52w_M_+", colnames(norm_counts), perl=TRUE, value=TRUE)),
    "Male vs Female (SPF, Old)" = c(grep("SPF_104w_F_+", colnames(norm_counts), perl=TRUE, value=TRUE),
                  grep("SPF_104w_M_+", colnames(norm_counts), perl=TRUE, value=TRUE)),
    "Male vs Female (GF, Young)" = c(grep("GF_8w_F_+", colnames(norm_counts), perl=TRUE, value=TRUE),
                  grep("GF_8w_M_+", colnames(norm_counts), perl=TRUE, value=TRUE)),
    "Male vs Female (GF, Middle-aged)" = c(grep("GF_52w_F_+", colnames(norm_counts), perl=TRUE, value=TRUE),
                  grep("GF_52w_M_+", colnames(norm_counts), perl=TRUE, value=TRUE)),
    "Male vs Female (GF, Old)" = c(grep("GF_104w_F_+", colnames(norm_counts), perl=TRUE, value=TRUE),
                  grep("GF_104w_M_+", colnames(norm_counts), perl=TRUE, value=TRUE))
)

Column order: age - microbiota - sex

In [14]:
plot_z_score_heatmap(z_scores,
                     deg$sign_fc_deg$genes,
                     col_order$ams,
                     annot_col$ams,
                     "All DE genes in Male vs Female for any microbiota and age",
                     col_order$ams)
pdf(paste('../results/dge/', dir_path, '/z_score_ams.pdf', sep=''))
plot_z_score_heatmap(z_scores,
                     deg$sign_fc_deg$genes,
                     col_order$ams,
                     annot_col$ams,
                     "All DE genes in Male vs Female for any microbiota and age",
                     col_order$ams)
dev.off()
pdf: 3
In [15]:
for(comp in names(comps)){
    plot_z_score_heatmap(z_scores,
        deg$sign_fc_deg %>% filter(!is.na((!!as.name(comp)))) %>% pull(genes),
        col_order$ams,
        annot_col$ams,
        paste("DE genes in", comp),
        comps[[comp]])
}

Column order: microbiota - age - sex

In [16]:
plot_z_score_heatmap(z_scores,
                     deg$sign_fc_deg$genes,
                     col_order$mas,
                     annot_col$mas,
                     "All DE genes in Male vs Female for any microbiota and age",
                     col_order$mas)
pdf(paste('../results/dge/', dir_path, '/z_score_mas.pdf', sep=''))
plot_z_score_heatmap(z_scores,
                     deg$sign_fc_deg$genes,
                     col_order$mas,
                     annot_col$mas,
                     "All DE genes in Male vs Female for any microbiota and age",
                     col_order$mas)
dev.off()
pdf: 3
In [17]:
for(comp in names(comps)){
    plot_z_score_heatmap(z_scores,
        deg$sign_fc_deg %>% filter(!is.na((!!as.name(comp)))) %>% pull(genes),
        col_order$mas,
        annot_col$mas,
        paste("DE genes in", comp),
        comps[[comp]])
}

Co-expression (WGCNA)

Z-score in modules

In [18]:
comps = names(comps)

Column order: age - microbiota - sex

In [19]:
plot_z_score_heatmap_with_modules(z_scores,
                                  rownames(z_scores),
                                  col_order$ams,
                                  annot_col$ams,
                                  genes_in_modules,
                                  "All genes")
In [20]:
for(comp in comps){
    plot_z_score_heatmap_with_modules(z_scores,
        deg$sign_fc_deg %>% filter(!is.na((!!as.name(comp)))) %>% pull(genes),
        col_order$ams,
        annot_col$ams,
        genes_in_modules,
        paste("DE genes in", comp))
}

Column order: microbiota - age - sex

In [21]:
plot_z_score_heatmap_with_modules(z_scores,
    rownames(z_scores),
    col_order$mas,
    annot_col$mas,
    genes_in_modules,
    "All genes")
In [22]:
for(comp in comps){
    plot_z_score_heatmap_with_modules(z_scores,
        deg$sign_fc_deg %>% filter(!is.na((!!as.name(comp)))) %>% pull(genes),
        col_order$mas,
        annot_col$mas,
        genes_in_modules,
        paste("DE genes in", comp))
}

Genes in modules

In [23]:
for(comp in comps){
    plot_top_deg_in_modules(deg$sign_fc_deg, comp, genes_in_modules)
}
options(repr.plot.width=7, repr.plot.height=7)

Enrichment analysis

In [24]:
deg = fit_proba_weighting_function(deg, gene_length)
Warning message in pcls(G):
“initial point very close to some inequality constraints”
Warning message in pcls(G):
“initial point very close to some inequality constraints”
Warning message in pcls(G):
“initial point very close to some inequality constraints”
Warning message in pcls(G):
“initial point very close to some inequality constraints”
Warning message in pcls(G):
“initial point very close to some inequality constraints”
Warning message in pcls(G):
“initial point very close to some inequality constraints”

GO analysis

In [25]:
deg = extract_GO_terms(deg, dir_path)
Warning message in stack.default(getgo(l$sign_fc_deg$genes, "mm10", "geneSymbol")):
“non-vector elements will be ignored”

Biological process

Dot-plot with the 10 most significant p-values for the different comparison

In [26]:
plot_top_go(deg, "BP", 10)
Using category as id variables

Using category, type as id variables

Warning message:
“Column `variable` joining factors with different levels, coercing to character vector”

Cellular components

Dot-plot with the 10 most significant p-values for the different comparison

In [27]:
plot_top_go(deg, "CC", 10)
Using category as id variables

Using category, type as id variables

Warning message:
“Column `variable` joining factors with different levels, coercing to character vector”

Molecular functions

Dot-plot with the 10 most significant p-values for the different comparison

In [28]:
plot_top_go(deg, "MF", 10)
Using category as id variables

Using category, type as id variables

Warning message:
“Column `variable` joining factors with different levels, coercing to character vector”

GO networks

The edge colors in the tree represent the relationship between two nodes. - green: positively regulates

  • red: negatively regulates
  • black: regulates
  • blue: is a
  • light blue: part of

GO Trees at "../results/dge/type-effect/type_gender_age/go/"

In [29]:
for(comp in comps){
    print(comp)
    fn = gsub(' ', '_', comp)
    fn = gsub('(\\(|\\)|,)', '', fn)
    fp = paste('../results/dge/', dir_path, 'go/', fn, '.png', sep='')
    col = get_GO_network_col_all_ont(deg$GO, comp)
    dotRes = getAmigoTree(goIDs=col$category,
                          color=col$color,
                          filename=fp,
                          picType="png",
                          saveResult=TRUE)
}
[1] "Male vs Female (SPF, Young)"
Warning message:
“Column `values` has different attributes on LHS and RHS of join”
Warning message:
“Column `values` has different attributes on LHS and RHS of join”
Error in getAmigoTree(goIDs = col$category, color = col$color, filename = fp, : konnte Funktion "getAmigoTree" nicht finden
Traceback:

KEGG pathways

In [ ]:
deg = extract_KEGG_pathways(deg, dir_path)
In [ ]:
plot_kegg_pathways(deg$KEGG$over$category,
                   deg$sign_fc_deg,
                   paste('../results/dge/', dir_path, '/kegg/over_repr_kegg/', sep=''))

Pathway graphs available at ../results/dge/gender-effect/gender_type_age/over_repr_kegg/

In [ ]:
plot_kegg_pathways(deg$under_represented_KEGG[,"category"],
                   deg$fc_deg,
                   paste('../results/dge/', dir_path, '/kegg/under_repr_kegg/', sep=''))

Pathway graphs available at ../results/dge/gender-effect/gender_type_age/under_repr_kegg/

Citations

In [ ]:
citation("pheatmap")
In [ ]:
citation("RamiGO")
In [ ]:
citation("pathview")